An Efficient Parameter-Free Method for Large Scale Offline Learning
نویسنده
چکیده
With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with datasets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time.
منابع مشابه
Offline Language-free Writer Identification based on Speeded-up Robust Features
This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...
متن کاملInfluences of Small-Scale Effect and Boundary Conditions on the Free Vibration of Nano-Plates: A Molecular Dynamics Simulation
This paper addresses the influence of boundary conditions and small-scale effect on the free vibration of nano-plates using molecular dynamics (MD) and nonlocal elasticity theory. Based on the MD simulations, Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is used to obtain fundamental frequencies of single layered graphene sheets (SLGSs) which modeled in this paper as the mo...
متن کاملUse of the Shearlet Transform and Transfer Learning in Offline Handwritten Signature Verification and Recognition
Despite the growing growth of technology, handwritten signature has been selected as the first option between biometrics by users. In this paper, a new methodology for offline handwritten signature verification and recognition based on the Shearlet transform and transfer learning is proposed. Since, a large percentage of handwritten signatures are composed of curves and the performance of a sig...
متن کاملA Trust Region Algorithm for Solving Nonlinear Equations (RESEARCH NOTE)
This paper presents a practical and efficient method to solve large-scale nonlinear equations. The global convergence of this new trust region algorithm is verified. The algorithm is then used to solve the nonlinear equations arising in an Expanded Lagrangian Function (ELF). Numerical results for the implementation of some large-scale problems indicate that the algorithm is efficient for these ...
متن کاملLarge-scale Inversion of Magnetic Data Using Golub-Kahan Bidiagonalization with Truncated Generalized Cross Validation for Regularization Parameter Estimation
In this paper a fast method for large-scale sparse inversion of magnetic data is considered. The L1-norm stabilizer is used to generate models with sharp and distinct interfaces. To deal with the non-linearity introduced by the L1-norm, a model-space iteratively reweighted least squares algorithm is used. The original model matrix is factorized using the Golub-Kahan bidiagonalization that proje...
متن کامل